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DEDICATION 



This paper is dedicated to the memory of Sidney Suslow, 
a founding member of the Association of Institutional Research, 
and a man whose constant energy went into the support of its 
purposes and goals. It was with Sid's support and encourage- 
ment that we pursued our interest in higher educational 
planning, and his pioneering work in obtaining longitudinal 
data on students led directly to our work in the study of 
longitudinal models. 



0. Introduction 



In 1968 Sidney Suslow, together with his colleagues in 
the Office of Institutional Research at the Berkeley Campus of 
the University of California, completed a study (Suslow et al . 
[4]) of undergraduate student attendance patterns over time. 

That report contains some of the earliest data the authors had 
seen on a given group, or cohort, of students, and how the group 
behaved over its undergraduate career. Most institutions keep 
only cross-sectional data obtained from enrollment statistics. 

It was the availability of the Suslow data that led the authors 
to pursue the formulation and analysis of enrollment models 
based on longitudinal student attendance patterns. The authors 
presented a constant-work model (Marshall and Oliver [2]) which 
explained the data quite successfully. They also, together 
with Suslow in [3], tried to find cross-sectional Markovian 
models to fit the longitudinal data (this latter work is repro- 
duced in a shdrtened form in Chapter 2 of Grinold and Marshall 
[1], which is perhaps more accessible than [3]). 

The purpose of this paper is to demonstrate how the 
longitudinal data can be used to determine variances, and hence 
confidence bounds, on student enrollment forecasts in addition 
to finding the forecasts themselves. Thus with each forecast 
we have a measure of the error that could be present. 
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1. Model Formulation 



We consider discrete points in time such as the beginning 
of a quarter, semester, or academic year. The particular choice 
depends on the model use and the availability of data. In our 
numerical examples we use the data from Suslow et al . [4], and 

hence our time points coincide with semesters. Thus when we write 
t = 1,2,3,..., we mean the start of the first, second, third, etc. 
semesters in the future; t = 0 will refer to the point "now" from 
which forecasts are being made, and t = -1, -2, -3, will refer to 
the first, second, third, etc. semesters in the past. 

Our first aim is to derive an expression for the expected 
number in attendance at some time t > 0. We do not differentiate 
groups such as freshmen, sophomores, or lower division, upper 
division. This could easily be done by placing subscripts on our 
notation, but we choose to simplify the notation to be consistent 
with the Suslow data on total student attendance. 

Let S(t;u) be the number of students in attendance at 
time t who entered (for the first time) at time t - u, 
u = 0,1,... . Let S(t) be the total number of students in 
attendance at time t. Then 

S (t) = S (t;0) + S (t;l) + S(t;2) + • • • + S (t;u) + • • • . (1) 

The data in [4] showed that for the periods studied 
(1950 's and 1960 's) there was very stable behavior in student 
attendance; the fraction of students who attended a given semester 
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after entrance was independent of when the students first entered. 
However, only fall-entering cohorts were studied. We assume here 
that stable behavior could be expected from spring-entering 
cohorts also, but that fall- and spring-entering students could 
have different continuation fractions. Let p^ (u) be the prob- 
ability that a student attends at time u after entering in the 
fall, independent of the particular entrance time. Let 
be equivalent probability for spring-entering students. We also 
assume that the attendance of any given student is independent 
of the attendance or non-attendance of any other student; i.e. 
all students act independently of each other. Table 1 gives 
Pj^(u) determined by Suslow et al. in [4]. 

Let N(t) be the number of new students who enter at 
time t. The above two assumptions imply that the value of (the 
random variable) S(t;u), given the value of N(t-u), has a 
binomial probability distribution. That is, 

Pr [S (t;u) = kiM (t-u) =m] = (^) p.(u)'^ [l-p.(u)]'^*^ , (2) 

for k = 0,1,,.., m, and n >_ 0 , where i = 1 for fall students 
and i = 2 for spring students. In particular the conditional 
expectation and the conditional variance of S(t;u) are given 
respectively by 



E[S(t;u) |N(t-u) = m] = mp^ (u) , (3) 

Var [S (t;u) In (t-u) = m] = mp^ (u) [1 - p^ (u) ] . (4) 
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u 


Pi (u) 


(u) (1 - p^ (u) ) 


f ^ 2 

Pi (u) 


0 


1.0 


0.0 


1.0 


1 


.972 


.0272 


.9448 


2 


.905 


.0860 


. 8190 


3 


. 756 


..1845 


.5715 


4 


.684 


.2161 


.4679 


5 


.593 


. 2414 


. 3516 


6 


.562 


.2462 


. 3158 


7 


.524 


.2494 


.2746 


8 


.498 


.2500 


.2480 


9 


.199 


.1594 


.0396 


10 


.130 


.1131 


.0169 


11 


.050 


.0475 


.0025 


12 


.036 


.0347 


.0013 


13 


.017 


.0167 


.0003 


14 


. 015 


.0148 


.0002 


15 


.011 


.0109 


.0001 


16 


.007 


.0070 


.0000 




6.959 


1.905 


5.054 



TABLE 1: Sample student attendance data from Suslow et al. [4]. 
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Let t be the start of a fall semester. After taking 
expectations in (1) and using (3), the expected total enrollment 
at time t is 

CO 

E[S(t)] = I p. , . (u) E[N(t-u)] . (5) 

u=0 

Here we have let 



i(u) =1 


if 


u = 


0,2, 4, 6, ... 


= 2 


if 


u = 


1,3, 5, 7, ... . 


For any two random variables X 


and 


Y 


the expression 



Var[X] = E[Var[X|Y]] + Var[E[XlY]] 

holds. We use this together with (1) , (3) and (4) to obtain for 

the variance of the total enrollment at time t, 

Var[S(t)] = y |^E[N(t-u)] p . ( u) ( 1 - p^ ^ ^ ^ (u) ) 

+ p^(^jj(u)“ Var(N(t-u))j . (6) 

Equations (5) and (6) give the e:cpected enrollment and 
its variance at time t. Recall that t is a fall semesrer. 

For the case when t is a spring semester we use 

i(u) =2 if u = 0,2, 4, 6, ... 

=1 if u=l,3,5,7, ... . 



5 



These expressions do not take into account the fact 



that we have knowledge of enrollments up to time t = 0 (the 
current time in our timing convention) . In (5) we know the 
values of N(0), N(-l), N(-2), etc. and thus our forecast for 
t > 0 becomes 



t-1 



(7) 



E[S(t) |N(0) ,N(-1) , .. .] 

= I N(t-U) + I E[N(t-u)l 

where i(u) is defined above for the particular case that' t 
is either fall or spring. The first summation term in equation 
(7) gives the expected "legacy" at time t of the given inputs 
up to and including the current time zero. The second summation 
gives the expected enrollment at time t from the expected input 
of new students at times 1, 2, ... , t. 

Similarly, by using equation (6), the variance of the 
forecast at t, given inputs up to and including time zero, 
becomes 



Var [S(t) In (0) ,N(-1) , . . . ] 



" Pi(u) “ Pi(u) N(t-u) 

t-1 - 

+ I (u) “ Pi (u) ^ E [N (t-u) ] + p^ (u) Var(N(t-u)) 

( 8 ) 



The first summation gives the contribution to the variance from 
the inputs up to and including the present. The second summation 
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gives the contribution which will occur from future inputs. Note 
that this depends on the variance of the new inputs for times 
l,2,...,t as well as the variance due to returning students. 

Table 1 gives data for p^(u), u 0 , obtained originally 

in the study for Suslow et al . [4], and reproduced on page 66 of 

[1] . The third and fourth columns give (u) (l-p^(u)) and 
2 

p^(u) respectively. These data are required in equation (8), 
whereas the data in column 2 are required in equation (7) . 

The usual interpretation given to the second column in 
Table 1 is simply the fraction of attending students out of a 
given cohort. The third column is the variance of the 3(t;u) 
terms divided by N(t-u) . It is interesting to see how the 
conditional expectation and the variance of the number of attend- 
ing students vary with the number of time periods that have 
elapsed since initial registration. As one might expect, the 
fraction of students out of a given cohort that return to attend 
decreases rapidly and there is a sharp drop of attendance after 
eight semesters. By the end of the 12th semester che fraction 
of attending students decreases to a number less than 4% of 
the original cohort. However, the conditional variance of the 
number returning first increases, has its maximvim when seven or 
eight semesters have elapsed and then decreases to a negligible 
amount by the end of the 12th semester. About the 12th semester, 
the conditional expectation and variance of the number attending 
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are about equal; this result is not surprising, if we recall 
that the Poisson distribution (whose variance and mean are equal) 
is a good approximation to the binomial distribution when the 
probability p(u) is small. Thus, students returning after 
10 periods can be classified as "rare" events in the sense that 
while the probability that an individual student attends is 
small the original cohort is large enough so that the probability 
distribution of returning students is Poisson. By similar 
arguments one can deduce that the number who do not attend in 
the first few semesters is also Poisson distributed. 

Consider a simple system where there is no variance in 
the new student input, which is a fixed amount, say n^, in each 
fall semester, and a fixed amount in each spring semester. 

Thus E[N(t)] = n^ and Var[N(t)] = 0 for all t where i = 1 
for a fall semester and i = 2 in the spring. Using these in 
(7) and (8) , and assuming p^(u) = ? 2 with the data in Table 1 
we obtain 

E[S(t)] = 3.873n^ + 3.122n2 , Var[S(t)] = 0.968n^ + 0.937n2 
for t a fall semester, and 

E[S(t)] = 3.873U2 + 3.122n^, Var[S(t)] = 0.968n2 + 0.937n^ 

for t a spring semester. All these expressions are independent 
of t because of the constant input each period. 
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Table 2 illustrates the use of these equations for three 
combinations of fall and spring input totalling 4000 per year, 
and assuming {pj^(u) = P 2 (u)} are given in Table 1. 



Semester 


Input 


Expected 

Enrollment 


Variance of 
Enrollment 


Fall 


4,000 


15, 348 


3,872 


Spring 


0 


12,488 


i 

3, 748 


Fall 


3,000 


14,633 


3,841 


Spring 


1,000 


13,203 


3,779 


Fall 


2,000 


13,918 


3,810 


Spring 


2,000 


13,918 


3,810 



TABLE 2: Illustrative calculations for differing fall/spring 

input values . 



A fairly typical use for Equations (7) and (8) is that 
of forecasting one period into the future. With the convention 
that t = 0 represents today (the start of a fall semester) , 
we obtain the next period forecast 
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oo 



(u) N(l-u) + E [N(D ] , 






I 



E [S(D In (0) , N (-1) , . . . ] 



U=1 



i(u) 



with i(u) = 1 for u even, i(u) = 2 for u odd, and 
provided p-^(O) = 1. The first (smnination) terms represents the 

« 

expected number of returning students and the second term repre- 

I 

sents the expected number of new admissions. The corresponding ^ 

expression for the variance of enrollments in the next period is ' 

CO i 

Var [S(D |n(0) ,N(-1) , . . . ] = I Pi (u) ~ ^i (u) ^ N (1-u) + Var [N | 



In this case where we assume all entering students in fact : 

show up, the fluctuations are due either to the uncertainty J. 

in the count of returning students already enrolled or to the || 

uncertainty in the new students. Thus one can obtain some idea 
of where new forecasting efforts should be directed. In certain 
institutions the dominant problem may be the uncertainties ; 

associated with returning students rather than with new students. 
If, for example, the past cohorts were approximately 3000 in J 

each fall and 1000 in each spring, but the next group of enter- 
ing students were Poisson with expected number and variance 

■( 

equal to 1000 then we would have (from Table 2) 

I 

Var[S[l] |N(0) ,N(-1) ,. . .] = 3779 + 1000 = 4779 . !| 



10 



In this case, two standard deviations (a measure of error often 
used and based on Normal distribution theory) would be 138 students 
which is slightly larger than the value we obtain when all 
admissions are constant (2 x /3770 = 122 from Table 2) . In 
other words it is possible to make various assumptions about the 
uncertainty of future enrollments and/or returning students and 
easily include them in our estimates of enrollment fluctuations. 

It is unlikely that student input each period would be 
constant. In the next section we analyze the model assuming 
that new admissions follow a Poisson distribution. 
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2 . 



Poisson Admissions 



The number of new students who actually enroll in a 
given future semester is not known with certainty. A simple 
method of modelling this uncertainty is to assume the number 
of new enrollments follows a Poisson distribution. Let n (t) 

be the expected number of new enrollments at time t. Then 

/ 

/ . X m -n (t) 

Pr[N(t) = m] = , m > 0 . (9) 

Xtl • 



From equations (2) and (9) we get 



1 , “P • / \ (u) n . , , ( t-u) 
p., ,(u) n., .(t-u)’" e 
Pr[S(t;u)=k] = ilH] 



This shows that each random variable in (1) has a Poisson dis- 
tribution, which together with our independence assumption, 
implies that the total enrollment at time t has a Poisson dis- 
tribution at every time t, with 

oo 

E[S(t)] =Var[S(t)] = \ ^i (u) * 

Using our previous example, but with Poisson input 
instead of fixed input, with n^^ = 3000, ri 2 = 1000 and 
p^(u) = P 2 (^^) in Table 1, we get again an expected enrollment 
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of 14,741 each fall and 13,239 each spring, but with variances 
of the same values. Thus two standard deviations would be 242 
each fall and 230 each spring, which show much more uncertainty 
in the forecasts as one would expect. 
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3 . Large Cohort Sizes 



We have already shown in equation (2) that the number 
of students attending out of a given entering cohort can be viewec 
as the result of summing successes in Bernoulli trials, where 
the probability of success is the probability that a student 
attends on a given semester. Thus, if add a finite number of 
such random variables to obtain the attendance at a later time 
period we again obtain a sum of successes in a finite number of 
Bernoulli trials. If the parameter p^ (u) of the Binomial dis- 
tribution in (2) did not change with time, then it would also be 
true that the sum in (1) is binomially distributed. This follows 
from the derivation of the distribution of the sum of successes 
in a finite number of Bernoulli trials, each trial having the 
same probability of success. Unfortunately, that is not the [ 

case; as we can easily see from Table 1 the parameter p^(u) 
changes rather dramatically with elapsed time since entry and 
the resulting^ distribution is obtained from the convolution of 
as many binomial distributions, with changing parameters, as 
there are terms in (1) . Although explicit expressions can be 
found for the generating function of such distributions, alge- 
braic expressions for the distribution itself are not simple. 
Fortunately, however, much can be said about the approximate 
behavior of the conditional distribution of S(t) if we assume 
that entering cohorts contain large numbers of students. 

The central limit theorem of probability theory states 
that if S(t;u) is the sura of the number of successes in n(t-u)|i 
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trials each with success probability p^(u), then the normalized 
sum 



S (t;u) 



S ( t; u) - p^ (u) n ( t - u) 
[p^ (u) (1 - p^ (u) ) n(t-u)]^^^ 



( 11 ) 



is approximately normally distributed. If we write 

$(a) = f e"^ ^^dv 

/!“ — 



for the normal distribution function, then with large cohort 
sizes, i.e., large numbers entering at t-u. 



★ 

Pr[S (t;u) £ a] ‘i>(a) independent of (u) and t. (12) 



As long as each entering cohort is large and entering cohorts 
act independently of one another the sum of a finite number of 
terms in (1) is also approximately normal. In this case 



?r[S (t) < a] - ^(a) , 



(13) 



where the normalization for S (t) is given by 



S (t) = 



S ( t) - y p . - , (u) n ( t-u) 
u>0 

n(t-u) (u) (1-p.,^, (ul) 



T7T 



(14) 
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Table 3 gives E[S(t;u)] for u = 0,1,..., 
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E[S(t)] together with 95% confidence intervals. Also tabulated 
is the length of the confidence intervals as a percentage of 
the expected values. Fall and spring semesters are shown in 
separate columns for clarity (again t is assumed to be a fall 
semester) . Note how the uncertainty as a percentage of the mean 
increases with time enrolled, and how small the error is on 
the total enrolled forecast compared to the individual semesters. 



information on the uncertainty in S(t); one can estimate the 
probability of the enrollment exceeding any given figure, of 
not exceeding any given figure, or of being in any given range. 
Let a and b be any two numbers with a < b. Then for 
nj^ = 3000, n .2 = 1000, t a fall semester, and the data given 
in Table 1 with Pj^(u) = P 2 (u) , then 



Equations (13) and (14) can be used to obtain more 



Pr [a < S (t) < b] 




From tables of the normal distribution we see that 



P[S(t) < 14,700] ~ 0.86 , 



P[S(t) >_ 14,500] ~ 0.98 , 
P[14,500 _< S(t) £ 14,700] ~ 0.84 . 



(15) 
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E[S(t;u) ] 
95% Confidence 


and 

Interval 


Confidence Interval as 
% of E[S(t;u) ] 


Time u 


Fall 


Spring 


Fall 


Spring 


0 


3000 + 0 




0 




1 




972 + 10 




2.1 


2 


2715 + 32 




2.4 




3 




756 + 27 




7.1 


4 


2052 + 51 




5.0 




5 




593 + 31 




10.5 


6 


1686 + 54 




6.4 




7 




524 + 32 




12.2 


8 


1494 + 55 




7.4 




9 




199 + 25 




25.1 


10 


390 + 37 




19.0 




11 - 




50 + 14 




56.0 


12 


108 + 20 




37.0 




13 




17 + 8 




94.1 


14 


■■ 45 + 13 




57.8 




15 




11 + 7 




127.3 


16 


21+9 




85.7 




Total 


14,633 


+ 124 


1.7 



TABLE 3: 



Forecasts and confidence intervals for each semester 
enrollment, = 3000, n 2 = 1000, and t a fall semester. 
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The Normal approximation for S(t) still holds if the 
admissions each semester are assumed to be Poisson, since the 
total enrollment is the sum of independent Poisson random 
variables with distribution given by (10) . In this case we 
consider 

S (t) - I P- (u) n(t-u) 

* u > 0 

S (t) = . 

For fall Poisson inputs with mean 3000, spring Poisson inputs 
with mean 1000, t a fall semester, and assuming p^(u) = P 2 (u) 
given in Table 1, then 



P[a < S(t) 




b - 14,633 
121 




- 14,633 
121 



In this case 

P[S(t) < 14,700] ~ 0.71 , 

P[S(t) >_ 14,500] ~ 0.86 , (16) 

P[14,500 £ S(t) £ 14,700] ~ 0.57 . 

A comparison of (15) and (16) shows the added uncertainty in 
the forecast due to randomness in the mombers of admissions. 
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